Due Date: Feburary 1, 2023 by 6pm
Objective: The goal of this project is to write a data story on philosophy using the dataset for the Philosophy Data Project. Applying data mining, statistical analysis and visualization, students should derive interesting findings in this collection of philosophy texts and write a "data story" that can be shared with a general audience.
In this assignment we will study the history of philosophy dataset and our objective is to understand how the language used in philosophical text has evolved over time. Especially how does philosphy study have been affected by outside factors such as technological advancements, major political and cultural event, gender and diversity representation, etc.
%pip install wordcloud
%pip install plotly
import numpy as np
import os
from tqdm import tqdm
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import TfidfVectorizer
from wordcloud import WordCloud, STOPWORDS
import plotly.express as px
import ast
In this part, we evaluate the general trend in the development of philosophy by analyzing the word usage trend and average sentence length over time.
data_path = "../data/philosophy_data.csv"
data = pd.read_csv(data_path)
data['author'].unique()
data.head()
# compute sentence length of different philosophers
df = data.groupby(data['author'])['original_publication_date'].mean().sort_values()
authors = list(dict(df))
author_by_time = pd.DataFrame(authors, columns=["authors"])
sent_len_by_author = data.groupby(data["author"])["sentence_length"].mean()
author_by_time["sent_len"] = list(sent_len_by_author[authors])
px.line(author_by_time, x="authors", y="sent_len", title="Average sentence length over time")
fig = px.box(data, x="sentence_length", y="author",
width=600, height=800,
title="Sentence length of Philosophers Writings")
fig.show()
How has the role of women in philosophy changed throughout history? Are philosophers more aware of gender issues in modern times? Does this change correspond in time with gender movement in history? We explore these questions in this part.
gender_words = {"woman", "female", "lady", "girl", "madam", "feminism", "gender",
"patriarchy", "sex", "sexism"}
df = data.groupby(data['author'])['original_publication_date'].mean().sort_values()
authors = list(dict(df))
author_by_time = pd.DataFrame(authors, columns=["authors"])
author_by_time["gender_word_count"] = 0
for i, author in enumerate(authors):
author_word = data[data["author"] == author]["tokenized_txt"]
for tokens in author_word:
tokens_ls = ast.literal_eval(tokens)
if gender_words & set(tokens_ls):
author_by_time.at[i, "gender_word_count"] += 1
log_author_by_time = author_by_time.copy()
log_author_by_time["gender_word_count"] = [0 if x == 0 else np.log2(x) for x in log_author_by_time["gender_word_count"]]
px.line(log_author_by_time, x="authors", y="gender_word_count", title="Gender term frequency over time")
This plot of gender term frequency reflects how has the role of women in philosophy changed throughout history. Ancient Greek philosophers such as Plato and Aristotle appear to have discussed gender-related issues more than their subsequent philosophers did in following millennia. Not until 20th centry Wittgenstein, Marx, and Nietzsche brounght gender issue back to public sight. And feminist philosopher Beauvoir has peaked among all other philosophers on mentioning this issue. For later philosophers, we can observe they are more aware of gender issue in their writing, as evidenced by an increasing amount of gender term frequency.
from nltk.corpus import stopwords
stopwords_manual = {"one", "two", "thing", "things", "would", "say", "said", "must", "something",
"make", "way", "good", "think", "man", "also", "like", "us", "come", "may","another",
"part", "parts", "case", "either"}
stopwords = set(stopwords.words('english'))
stopwords = stopwords | stopwords_manual
# word cloud
author_text = data[data["author"] == "Beauvoir"]
author_words = ""
# iterate through tokenized text
for tokens in author_text["tokenized_txt"]:
tokens_ls = ast.literal_eval(tokens)
author_words += " ".join(tokens_ls)
wordcloud = WordCloud(width = 800, height = 800,
background_color ='white',
stopwords = stopwords,
min_font_size = 10).generate(author_words)
# plot the WordCloud image
plt.figure(figsize = (4, 4), facecolor = None)
plt.imshow(wordcloud)
plt.axis("off")
plt.tight_layout(pad = 0)
plt.show()
This word cloud indicates how the work of philosopher Beauvoir has focused on gender issue and feminism.
One of the most famous philosopher who writes about political opionions is Marx, so we start by analyzing his writing and see how politics is involved there.
# word cloud
author_text = data[data["author"] == "Marx"]
author_words = ""
# iterate through tokenized text
for tokens in author_text["tokenized_txt"]:
tokens_ls = ast.literal_eval(tokens)
author_words += " ".join(tokens_ls)
wordcloud = WordCloud(width = 800, height = 800,
background_color ='white',
stopwords = stopwords,
min_font_size = 10).generate(author_words)
# plot the WordCloud image
plt.figure(figsize = (4, 4), facecolor = None)
plt.imshow(wordcloud)
plt.axis("off")
plt.tight_layout(pad = 0)
plt.show()
political_words = {"politics", "political", "public", "government", "labour", "capital", "capitalism", "price", "product",
"money", "communism", "state", "ownership"}
df = data.groupby(data['author'])['original_publication_date'].mean().sort_values()
authors = list(dict(df))
author_by_time = pd.DataFrame(authors, columns=["authors"])
author_by_time["political_word_count"] = 0
for i, author in enumerate(authors):
author_word = data[data["author"] == author]["tokenized_txt"]
for tokens in author_word:
tokens_ls = ast.literal_eval(tokens)
if political_words & set(tokens_ls):
author_by_time.at[i, "political_word_count"] += 1
px.line(author_by_time, x="authors", y="political_word_count", title="Political term frequency over time")
We can observe that ancient Greek philosophers have briefly talked about politics (perhaps when talking about their ideal community and society model). Then politics fall out of flaver for a long time, until Adam Smith brings capitalism back to philosophy, and talk about the relationship between labor and captial. The third peak is when Hegel and Marx brings the idea of Communism into philosophy. Both of these correponds to social events of capitalist/communist revolution.